A Novel Approach of Calculating Information Entropy in Information Extraction
نویسندگان
چکیده
Noise data of web page is easy to cause the topic drift problem in web information extraction. To improve the accuracy of web information extraction effectively, a novel calculation method of mixing entropy is presented, which can more accurately reflect the topic information of web page. The information block is discussed under the multi-page site environment. The impacts of information within local page and the same information distribution between web pages generated by template are all considered so as to ensure the precision of calculating information entropy. The method is verified by calculating the entropy of information block in information extraction. Compared with other methods, the simulation results indicate that the novel method shows great superiority over other traditional methods in both the accuracy of information entropy calculation and discrimination between topic-related information blocks and topic-unrelated information blocks.
منابع مشابه
A novel ranking method for intuitionistic fuzzy set based on information fusion and application to threat assessment
A novel ranking method based on multi-time information fusion is proposed for intuitionistic fuzzy sets (IFSs) and applied to the threat assessment problem, a multi-attribute decision making (MADM) one. This method integrates a designed intuitionistic fuzzy entropy (IFE), the closeness degree of technique for order preference by similarity to ideal solution (TOPSIS), the decision maker¡¯s (DM¡¯...
متن کاملA multi agent method for cell formation with uncertain situation, based on information theory
This paper assumes the cell formation problem as a distributed decision network. It proposes an approach based on application and extension of information theory concepts, in order to analyze informational complexity in an agent- based system, due to interdependence between agents. Based on this approach, new quantitative concepts and definitions are proposed in order to measure the amount of t...
متن کاملA new approach factor- entropy with application to business costs of SMEs in Shanghai
Business cost is acknowledged as one of the priorities in SMEs research. In thisstudy, the business cost of SMEs in Shanghai was primarily measured using Factor-Entropy analysis method. The purpose of this study is to effectively resolve the issueof simplification and assignment evaluation index system on business costs of SMEsin Shanghai. However, this study uses factor analysis to interpret t...
متن کاملTwo Novel Chaos-Based Algorithms for Image and Video Watermarking
In this paper we introduce two innovative image and video watermarking algorithms. The paper’s main emphasis is on the use of chaotic maps to boost the algorithms’ security and resistance against attacks. By encrypting the watermark information in a one dimensional chaotic map, we make the extraction of watermark for potential attackers very hard. In another approach, we select embedding po...
متن کاملSupervised feature extraction algorithm based on improved polynomial entropy
Based on information entropy theory, a novel feature extraction algorithm based on improved polynomial entropy (IPE) is set up. Firstly, the concepts and their properties of information entropy and cross entropy (CE) are analysed and studied. On this foundation, we prove that symmetrical cross entropy (SCE) proposed here based on CE satisfies three axioms of the distance, i.e. nonnegativity, sy...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013